Far-Field Automatic Speech Recognition
نویسندگان
چکیده
The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic (ASR), has received significant increase in attention science and industry, which caused or was by an equally improvement accuracy. Meanwhile, it entered consumer market with digital home assistants language interface being its most prominent application. Speech recorded is affected various acoustic distortions, consequently, quite different processing pipelines have emerged compared ASR for close-talk speech. A signal enhancement front end dereverberation, source separation, beamforming employed to clean up speech, back-end engine robustified multicondition training adaptation. We will also describe so-called end-to-end approach ASR, new promising architecture that recently been extended scenario. This tutorial article gives account algorithms used enable accurate distance, be seen that, although deep learning share technological breakthroughs, clever combination traditional can lead surprisingly effective solutions.
منابع مشابه
Double the trouble: handling noise and reverberation in far-field automatic speech recognition
Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones. In an earlier paper, we showed impr...
متن کاملMultichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions
Recent automatic speech recognition (ASR) results are quite good when the training data is matched to the test data, but much worse when they differ in some important regard, like the number and arrangement of microphones or differences in reverberation and noise conditions. This paper proposes an unsupervised spatial clustering approach to microphone array processing that can overcome such tra...
متن کاملTracking and Far-Field Speech Recognition for Multiple Simultaneous Speakers
In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorpora...
متن کاملFeature mapping using far-field microphones for distant speech recognition
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...
متن کاملHilbert Envelope Based Features for Far-Field Speech Recognition
Automatic speech recognition (ASR) systems, trained on speech signals from close-talking microphones, generally fail in recognizing far-field speech. In this paper, we present a Hilbert Envelope based feature extraction technique to alleviate the artifacts introduced by room reverberations. The proposed technique is based on modeling temporal envelopes of the speech signal in narrow sub-bands u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the IEEE
سال: 2021
ISSN: ['1558-2256', '0018-9219']
DOI: https://doi.org/10.1109/jproc.2020.3018668